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Abstract 

We revisit the problem of accurately answering large classes of statistical queries while preserving 
differential privacy. Previous approaches to this problem have either been very general but have not had 
run-time polynomial in the size of the database, have applied only to very limited classes of queries, 
or have relaxed the notion of worst-case error guarantees. In this paper we consider the large class of 
sparse queries, which take non-zero values on only polynomially many universe elements. We give 
efficient query release algorithms for this class, in both the interactive and the non-interactive setting. 
Our algorithms also achieve better accuracy bounds than previous general techniques do when applied 
to sparse queries: our bounds are independent of the universe size. In fact, even the runtime of our 
interactive mechanism is independent of the universe size, and so can be implemented in the "infinite 
universe" model in which no finite universe need be specified by the data curator. 



1 Introduction 

A database V represents a finite collection of individual records from some data universe X, which repre- 
sents the set of all possible records. We typically think of X as being extremely large: exponentially large 
in the size of the database, or in some cases, possibly even infinite. A fundamental task in private data 
analysis is to accurately answer statistical queries about a database V, while provably preserving the privacy 
of the individuals whose records are contained in V. The privacy solution concept we use in this paper is 
differential privacy, which has become standard, and which we define in section [2] 

Accurately answering statistical queries is the most well studied problem in differential privacy, and the 
results to date come in two types. There are a large number of extremely general and powerful techniques 



(see for example 1BLR08I lDNR+091 IDRV10I IRR10I IHT10I IHR10IH that can accurately answer arbitrary 
families of statistical queries which can be exponentially large in the size of the database. Unfortunately, 
these techniques all have running time that is at least linear in the size of the data universe \X\ (i.e. pos- 
sibly exponential in the size of the database), and so are in many cases impractical. There are also several 
techniques that do run in polynomial time, but that are limited: either they can answer queries from a very 
general and structurally rich class (i.e. all low-sensitivity queries), but can only answer a linear number 
of such queries (i.e. [DMNS06]), or they can answer a very large number of queries, but only from a 
structurally very simple class (i.e. intervals on the unit lind3 [BLR08]), or as in several recent results (for 
conjunction and parity queries respectively) RGHRU 1 1 [ IHRS 1 ll they run in polynomial time, but offer only 
average case guarantees for randomly chosen queries. One of the main open questions in data privacy is 
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to develop general data release techniques comparable in power to the known exponential time techniques 
that run in polynomial time. There is evidence, however, that this is not possible for arbitrary linear queries 
I DNR+091 IUVTT1 IGHRU 1 fl . 

In this paper, we consider a restricted but structurally rich class of linear queries which we call sparse 
queries. We say that a query is m-sparse if it takes non-zero values on only m universe elements, and that 
a class of queries is m-sparse if each query it contains is m! sparse for some m' < m. We will typically 
think of m as being some polynomial in the database size n. Note that although each individual query 
is restricted to have support on only a polynomially sized subset of the data universe, different queries in 
the same class can have different supports, and so a class of sparse queries can still have support over the 
entire data universe. Note that the class of m-sparse queries is both very large (of size roughly |Af| m ), and 
very structurally complex (the class of m-sparse queries have VC-dimension m). Sparse queries represent 
questions about individuals whose answer is rarely "yes" when asked about an individual who is drawn 
uniformly at random from the data population. Nevertheless, such questions can be useful to a data analyst 
who has some knowledge about which segment of the population a database might be drawn from. For 
example, a database resulting from a medical study might contain individuals who have some rare disease, 
but the data analyst does not know which disease - although there may be many such queries, each one is 
sparse. Alternately, a data analyst might have knowledge about the participants of several previous studies, 
and might want to know how much overlap there is between the participants of each previous study and of 
the current study. In general, sparse queries will only be useful to a data analyst who has some knowledge 
about the database, beyond that it is merely a subset of an exponentially sized data universe. Our results can 
therefore be viewed as a way of privately releasing information about a database that is useful to specialists 
- but is privacy preserving no matter who makes use of it. In general, this work can be thought of as part of 
an agenda to find ways to make use of the domain knowledge of the data analyst, to make private analysis 
of large-scale data-sets feasible. 

1.1 Results 

We give two algorithms for releasing accurate answers to m-sparse queries while preserving differential 
privacy: one in the interactive setting, in which the data curator acts as an intermediary and must answer 
an adaptively chosen stream of queries as they arrive, and one in the non-interactive setting, in which the 
data curator must in one shot output a data-structure which encodes the answers to every query of interest. 
In the interactive setting, we require that the running time needed to answer each query is bounded by a 
polynomial in n, the database size (so to answer any sequence of k queries takes time k • poly(n)). In the 
non-interactive setting, the entire computation must be performed in time polynomial in re, and the time 
required to evaluate any query on the output data structure must also be polynomial. Therefore, from the 
point of view of running time, the non-interactive setting is strictly more difficult than the interactive setting. 
In the interactive setting, we give the following utility bound: 

Theorem 1.1 (Informal, some parameters hidden). There exists an (e, 5) -differentially private query release 
mechanism in the interactive setting, with running time per query 0(m/a 2 ) that is a-accurate with respect 
to any set of k adaptively chosen m-sparse queries with: 



In the non-interactive setting, we give the bound: 

Theorem 1.2 (Informal, some parameters hidden). There exists an (e, 5) -differentially private query release 
mechanism in the non-interactive setting, with running time polynomial in the database size re, m, and 
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log \ X\, that is a-accurate with respect to any class ofk m-sparse linear queries, with: 



a = O log k 
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Several aspects of these theorems are notable. First, the accuracy bounds do not have any dependence 
on the size of the data universe \X\, and instead depend only on the sparsity parameter m. Therefore, in 
addition to efficiency improvements, these results give accuracy improvements for sparse queries, when 
compared to the general purpose (inefficient) mechanisms for linear queries, which typically have accuracy 
which depends on log \X\. Since we typically view \X\ as exponentially large in the database size, whereas 
m is only polynomially large in the database size for these algorithms to be efficient, this can be a large 
improvement in accuracy. 

Second, the interactive mechanism does not even have a dependence on | X\ in its running time! In fact, 
it works even in an infinite universe (e.g. data entries with string valued attributes without pre-specified 
upper bound on length jl In this setting, queries may still be concisely specified as a list of polynomially 
many individuals from the possibly infinite universe that satisfy the query. Moreover, because the accuracy 
of this mechanism depends only very mildly on m, and the running time is linear in m, it can be used to 
answer m-sparse queries for arbitrarily large polynomial values of m, where the mechanism is constrained 
only by the available computational resources. 

The non-interactive mechanism in contrast has a worse dependence on m. This bound essentially 
matches the error that would result from releasing the perturbed histogram of the database, but does so 
in a way that requires computation and output representation only polynomial in n (rather than linear in 
\X\, as releasing a histogram would require). Because accuracy bounds > 1 are trivial, this mechanism 
only guarantees non-trivial accuracy for m-sparse queries with m << n 2 /logk (This is still of course a 
very large class of queries: there are roughly \X\ n / lo s fc such queries, i.e., super-exponentially many in n). 
Nevertheless, there are distinct advantages to having a non-interactive mechanism that only needs to be run 
once. This is among the first polynomial time non-interactive mechanisms for answering an exponentially 
large, unstructured class of queries while preserving differential privacy. 

We note that our results give as a corollary, more efficient algorithms for answering conjunctions with 
many literals. This complements the beautiful recent work of Hardt, Rothblum, and Servedio [HRS1 1 ], who 
give more efficient algorithms for answering conjunctions with few literals, based on reductions to threshold 
learning problems. 

1.2 Techniques 

Our interactive mechanism is a modification of the very general multiplicative weights mechanism of Hardt 
and Rothblum [HR10]. We give the interactive mechanism via the framework of [GRU1 1] which efficiently 
maps objects called iterative database constructions (defined in section [3]> into private query release mecha- 
nisms in the interactive setting. IDC algorithms are very similar to online learning algorithms in the mistake 
bound model, and we use this analogy to implement a version of the multiplicative weights IDC of Hardt 
and Rothblum [HR10] analogously to how the Winnow algorithm is implemented in the infinite attribute 
model of learning, defined by Blum [Blu90|. The algorithm roughly works as follows: the multiplicative 
weights algorithm normally maintains a distribution over \X\ elements, one for each element in the data 

2 The algorithm must be able to read a name for each universe element it deals with, and so it can of course not deal with 
elements that have no finite description length. But for a (countably) infinite universe, the running time would depend on the length 
of the largest string used to denote a universe element encountered during the running of the algorithm, and not in any a-priori way 
on the (unboundedly large) size of the universe. 
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universe. It can be easily implemented in such a way so that when it is updated after a query Q arrives, only 
those weights corresponding to elements in the support of the query Q are updated: for an m-sparse query, 
this means it only need update m positions. It also comes with a guarantee that it never needs to perform 
more than log \X\/a 2 updates before achieving error a, and so at most m log \X \/a 2 elements ever need to 
be updated. The key insight is to pick a smaller universe, X, such that X > m log X /a 2 , but not to commit 
to the identity of the elements in this universe before running the algorithm, letting all elements be initially 
unassigned. The algorithm then maintains a hash table mapping elements of X to elements of X . Elements 
in X are assigned temporary mappings to elements in X as queries come in, but are only assigned permanent 
mappings when an update is performed. Because only log X jo? updates are ever performed, and X was 
chosen such that X > mlog X /a 2 , the algorithm never runs out of elements of X to permanently assign. 
Because \X\ depends only on the desired accuracy a and the sparsity parameter m, and not on X in any 
way, the algorithm can be implemented and run without any knowledge of X (even for infinite universes), 
and neither the running time nor the resulting accuracy depend on \X\. 

The non-interactive mechanism releases a random projection of the database into polynomially many 
dimensions, together with the corresponding projection matrix. Queries are evaluated by computing their 
projection using the public projection matrix, and then taking the inner product of the projected query 
and the projected database. The difficulty comes because the projection matrix projects vectors from \X\- 
dimensional space to poly(n) dimensional space, and so normally would take \X | poly (n) -many bits to 
represent. Our algorithms are constrained to run in time poly(n), however, and so we need a concise 
representation of the projection matrix. We achieve this by using a matrix implicitly generated by a family 
of limited-independence hash functions which have concise representations. This requires using a limited 
independence version of the Johnson-Lindenstrauss lemma, and of concentration bounds. This algorithm 
also gives accuracy bounds which are independent of \X\. 



1.3 Related Work 

Differential privacy was introduced by Dwork, McSherry, Nissim, and Smith [DMNS06], and has since 
become the standard solution concept for privacy in the theoretical computer science literature. There is now 
a vast literature concerning differential privacy, so we mention here only the most relevant work, without 
attempting to be exhaustive. Dwork et al. [DMNS06] also introduced the Laplace mechanism, which is 
able to efficiently answer arbitrary low-sensitivity queries in the interactive setting. The Laplace mechanism 
does not make efficient use of the privacy budget however, and can answer only linearly many queries in the 
database size. 

Blum, Ligett, and Roth [BLR08] showed that in the non-interactive setting, it is possible to answer 
exponentially sized families of counting queries. This result was extended and improved by Dwork et al. 
[DN R+091 and Dwork, Rothblum, and Vadhan [DRV10], who gave improved running time and accuracy 
bounds, and for (e, 5) -differential privacy gave similar results for arbitrary low sensitivity queries. Roth 
and Roughgarden [RR10] showed that accuracy bounds comparable to [BLR08] could be achieved even 
in the interactive setting, and this result was improved in both accuracy and running time by Hardt and 
Rothblum, who give the multiplicative weights mechanism, which achieves nearly optimal accuracy and 
running time HHR1011 . Gupta, Roth, and Ullman HGRU111 generalize the algorithms of IIRRlOllHRTOl into 
a generic framework in which objects called iterative database constructions efficiently reduce to private 
data release mechanisms in the interactive setting. Unfortunately, the running time of all of the algorithms 
discussed here is at least linear in \X\, and so typically exponential in the size of the private database. 
Moreover, there are both computational and information theoretic lower bounds suggesting that it may be 
very difficult to give private release algorithms for generic linear queries with substantially better run time 
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any answered query. 

There is also a small body of work giving more efficient query release mechanisms for specific classes of 
queries. [BLR08] gave an efficient (running time polynomial in the database size n) algorithm for releasing 
the answers for 1 -dimensional intervals on the discretized unit-line in the non-interactive setting. As far as 
we know, prior to this work, this was the only efficient mechanism in either the interactive or non-interactive 
settings for releasing the answers to an exponentially sized family of queries with worst-case error. This 
class is however structurally very simple: it has VC-dimension only 2. Other efficient algorithms relax the 
notion of utility, no longer guaranteeing worst-case error for all queries. [BLR08] also give an efficient 
algorithm for releasing halfspace queries in the unit sphere, but this algorithm only guaranteed accurate 
answers for halfspaces that happened to have large margin with respect to the points in the database. Gupta 
et al [GHRU11] gave an algorithm for releasing conjunctions over d attributes to average error a over any 
product distribution (over conjunctions), which runs in time d 0<yl / a \ This was improved to have running 
time 0(d logl / a ) by Cheraghchi et al. [CK KLlfl . Note that these algorithms only run in polynomial time 
for constant values of a, and only give accuracy bounds in expectation over random queries. Recently, 
Hardt, Rothblum, and Servedio HHRS 1 1 H gave an algorithm for releasing conjunctions defined on k out 
of d literals with an average-error guarantee for any pre-specified distribution in time d°^^\ Using the 
private boosting algorithm of [DRV 10], they leverage this result to give an algorithm for releasing /c-literal 
conjunctions with worst-case error guarantees, which increases the running time to d°^ k \ although still only 
requiring databases of size d°^^\ They also gave an efficient (i.e. running time polynomial in ri) algorithm 
for releasing parity queries to low average error over product distributions. We remark that our results give 
a complementary bound for large conjunctions (with a better sample complexity requirement). Our online 

algorithm can release all conjunctions on d — k out of d literals with worst-case error guarantees in time 

d O(k) 

, requiring databases of size only logd). 

The efficient interactive mechanism we give in section [3] is based on an analogy between iterative 
database construction (IDC) algorithms and online learning algorithms in the mistake bound model. We 
implement the multiplicative weights IDC of Hardt and Rothblum [HR10] analogously to how Winnow 
is implemented in the infinite attribute model of Blum [Blu90]. In our setting, it can be thought of as an 
infinite universe model that has no dependence on the universe size in either the running time or accuracy 
bounds. This involves running the multiplicative weights algorithm on a much smaller universe. Hardt and 
Rothblum [HR10] also gave a version of their algorithm which ran on a small subset of the universe to give 
efficient run-time guarantees. The main difference is that we select the subset of the universe that we run 
the multiplicative weights algorithm on adaptively, based on the queries that arrive, whereas [HR10] select 
the subset nonadaptively, independently of the queries. [HR10] give average case utility bounds for linear 
queries on randomly selected databases; in contrast, we give worst-case utility bounds that hold for all input 
databases, but only for sparse linear queries. 

The efficient non-interactive mechanism we give in section [4] is based on random projections using 
families of limited independence hash functions, which have previously been used for space-bounded com- 
putations in the streaming model [CW091 IKN101 . Limited independence hash functions have also previously 
been used for streaming algorithms in the context of differential privacy ||DNP + I0ll . 

2 Preliminaries 

A database V is a multiset of elements from some (possibly infinite) abstract universe X. We write \D\ = n 
to denote the cardinality of V. For any x S X we can also write D[x] to denote: V[x] = {x' G V : x' = x} 
the number of elements of type x in the database. Viewed this way, a database V G N'*' is a vector with 
integer entries in the range [0, n]. 
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A linear query Q : X — > [0, 1] is a function mapping elements in the universe to values on the real unit 
interval. For notational convenience, we will define Q(0) = 0. We can also evaluate a linear query on a 
database. The value of a linear query Q on a database is simply the average value of Q on elements of the 
database: 

Q{V) = -V Q(x) = -V Q(x)D[x] 
n n 

Similarly to how we can think of a database as a vector, we can think of a query as a vector Q G [0, l]'^' 
with Q[x] = Q(x). Viewed this way, Q(T>) = ~(Q,V). 

It will sometimes be convenient to think of normalized databases (with entries that sum to 1). For 
a database V of size n, we define the corresponding normalized database V to be the database such 
that V[x] = V[x]/n. We evaluate a linear query on a normalized database by computing Q(V) = 
Z xeX Q{x)V[x] = (Q, V). Note that Q(V) = Q(T>). 

Definition 2.1 (Sparsity). The sparsity of a linear query Q is \{x G X : Q(x) > 0} | , the number of elements 
in the universe on which it takes a non-zero value. We say that a query is m-sparse if its sparsity is at most 
m. We will also refer to the class of all m-sparse linear queries, denoted Q m . 

In this paper, we will assume that given an m-sparse query, we can quickly (in time polynomial in m) 
enumerate the elements x G X on which Q(x) > 0. 

Remark 2.2. While the assumption that we can quickly enumerate the non-zero values of a query may 
not always hold, it is indeed the case that for many natural classes of queries, we can enumerate the non- 
zero elements in time linear in m. For example, this holds for queries that are specified as lists of the 
universe elements on which the query is non-zero, as well as for many implicitly defined query classes such 
as conjunctions, disjunctions, parities, efcJl Of course, classes like conjunctions are typically not sparse, 
but conjunctions with d — O(logn) literals are, and their support can be quickly enumerated (even though 
there are superpolynomially many such conjunctions). 



2.1 Utility 

We will design algorithms which can accurately answer large numbers of sparse linear queries. We will be 
interested in both interactive mechanisms and non-interactive mechanisms. A non-interactive mechanism 
takes as input a database, runs one time, and outputs some data structure capable of answering many queries 
without further interaction with the data release mechanism. An interactive mechanism takes as input a 
stream of queries, and must provide a numeric answer to each query before the next one arrives. 

Definition 2.3 (Accuracy for non-Interactive Mechanisms). Let Q be a set of queries. A non-interactive 
mechanism M : X* — > R for some abstract range is (a, (3) -accurate for Q if there exists a function 
Eval : Q x R — > E s.t. for every database V £ X*, with probability at least 1 — /3 over the coins of 
M, M(T>) outputs r G R such that maxggg \Q(T>) — Eval(Q, r)| < a. We will abuse notation and write 
Q(r) = Eval(Q,r). 

M is efficient if both M and Eval run in time polynomial in the size of the database n. 

Definition 2.4 (Accuracy for Interactive Mechanisms). Let Q be a set of queries. An interactive mechanism 
M takes as input an adaptively chosen stream of queries Qi, . . . , Qk G Q and for each query Qi, outputs an 

3 The set of conjunctions over the d-dimensional boolean hypercube with d— log(n) literals are n-sparse. Even though there are 
superpolynomially many such conjunctions, it is simple to enumerate the entries on which these conjunctions take non-zero value 
in time linear in n. We can simply enumerate all of the 2 log n = n values that the unassigned variables can take. 
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answer aj G R before receiving Qi+±. It is (a, f3)-accurate if for every database V G A'*, with probability 
at least 1 — /3 over the coins of M: maxj |Qj — Oj| < a. 

M is efficient if the update time for each query (i.e. the time to produce answer ai after receiving query 
Qi) is polynomial in the size of the database n. 

2.2 Differential Privacy 

We will require that our algorithms satisfy differential privacy, defined as follows. We must first define the 
notion of neighboring databases. 

Definition 2.5 (Neighboring Databases). Two databases V,V are neighbors if they differ only in the data 
of a single individual: i.e. if their symmetric difference is \VAV'\ < 1. 

Definition 2.6 (Differential Privacy [DMNS06]). A randomized algorithm M acting on databases and out- 
putting elements from some abstract range R is (e, 5) -differentially private if for all pairs of neighboring 
databases V, V and for all subsets of the range SCR the following holds: 

Pi[M(V) G S] < exp(e)Pr[M(£>') G S] + 5 

Remark 2.7. For a non-interactive mechanism, R is simply the set of data-structures that the mechanism 
outputs. For an interactive mechanism, because the queries may be adaptively chosen by an adversary, R is 
the set of query/answer transcripts produced by the algorithm when interacting with an arbitrary adversary. 
For a detailed treatment of differential privacy and adaptive adversaries, see I D RV10\I . 

A useful distribution is the Laplace distribution. 

Definition 2.8 (The Laplace Distribution). The Laplace Distribution (centered at 0) with scale b is the 
distribution with probability density function: Lap(x|6) = ^ exp(— -^). We will sometimes write Lap(fr) 
to denote the Laplace distribution with scale b, and will sometimes abuse notation and write Lap(6) simply 
to denote a random variable X ~ Lap(&). 

A fundamental result in data privacy is that perturbing low sensitivity queries with Laplace noise pre- 
serves (e, 0) -differential privacy. 

Theorem 2.9 ([DMNS06]). Suppose Q : X* — > R is a function such that for all neighboring databases T> 
and T>' , \Q(T>) — Q(T>')\ < c. Then the procedure which on input T> releases Q(T>) + X, where X is a draw 
from a Lap{c/e) distribution, preserves (e, ^-differential privacy. 

It will be useful to understand how privacy parameters for individual steps of an algorithm compose into 
privacy guarantees for the entire algorithm. The following useful theorem is a special case of a theorem 
proven by Dwork, Rothblum, and Vadhan: 

Theorem 2.10 (Privacy Composition HDRV10m . Let < e, 5 < 1, and let Mi,...,M T be (e',0)- 
differentially private algorithms for some e' at most: 



V 8T Mi)' 

Then the algorithm M which outputs M(T>) = (Mi(T>), . . . , Mt(T>)) is (e, 5) -differentially private. 
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3 A Fast IDC Algorithm For Sparse Queries 



In this section we use the abstraction of an iterative database construction that was introduced by Gupta, 
Roth, and Ullman [GRU11]. It was shown in IGRU11] that efficient IDC algorithms automatically reduce 
to efficient differentially private query release mechanisms in the interactive setting. Roughly, an IDC 
mechanism works by maintaining a sequence of data structures T>\ , T>2 , • • • that give increasingly good 
approximations to the input database V (in a sense that depends on the IDC). Moreover, these mechanisms 
produce the next data structure in the sequence by considering only one query Q that distinguishes the real 
database in the sense that Q(V t ) differs significantly from Q(T>). 

Syntactically, we will consider functions of the form U : x Q x R — > 1Z\j. The inputs to U 
are a data structure in 1Zjj, which represents the current data structure V t ; a query Q, which represents the 
distinguishing query, and may be restricted to a certain set Q; and also a real number which estimates Q(V). 
Formally, we define a database update sequence , to capture the sequence of inputs to U used to generate 
the database sequence T>i , T>2 , 

Definition 3.1 (Database Update Sequence). Let V G ^\ x \ be any database and let 

j(X^,(5i,^)j G (T^u x Q x W) T be a sequence of tuples. We say the sequence is an 

(U, T>, Q, a, T)-database update sequence if it satisfies the following properties: 
1. V 1 = U( 
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2. for every t = 1,2,..., T, 

3. for every t = 1,2,..., T, 



Qt(V) - Q t (V t )\ > a, 
Q t (V)-A t <a, 



4. and for every t = 1, 2, . . . , T - 1, V t+1 = XJ(V t ,Q t , A 



Definition 3.2 (Iterative Database Construction). Let U : TZ\j xQxR-> TZu be an update rule and let 
B : E — > M. be a function. We say U is a B (a) -iterative database construction for query class Q if for 
every database V G N'*', every (U, T>, Q, a, T)-database update sequence satisfies T < B(a). 

Note that the definition of an £?(a)-iterative database construction implies that if U is a i?(a)-iterative 
database construction, then given any maximal (U, V, Q, a, T)-database update sequence, the final database 
Vt must satisfy maxQ g g \Q^P) — Q{T^t)\ < a or else there would exist another query satisfying property 
2 ofDefinition l3.il and thus there would exist a (U, V, Q, a, T+l)-database update sequence, contradicting 
maximality. 

-B(a)TDC algorithms generically reduce to (e, 5) -differentially private (a, ft) -accurate query release 
mechanisms in an efficiency preserving way. This framework was implicitly used by [RR10] and [HR10]. 

Theorem 3.3 ([GRU11]). If there exists a B(a)-IDC algorithm for a class of queries Q using a class 
of datastructures TZ\j that take time at most p(n,a, \X\) to update their hypotheses, and time at most 
q(n,a, \X\) to evaluate a query on any V G 7£u» then for any < e, 8, j3 < 1 there exists an (e,S)- 
differentially private query release mechanism in the interactive setting that has update time at most 
0(p(n,a, X) + q(n, a, X)) and is (a, f3) -accurate with respect to any adaptively chosen sequence of k 
queries from Q where a is the solution to the following equality: 



3000 WBJa) log (4M) log(fc//3) 
a = 
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In this section we will give an efficient IDC algorithm for the class of m-sparse queries, and then call on 
Theorem l3.3l to reduce it to a differentially private query release mechanism in the interactive setting. 

First we introduce the Sparse Multiplicative Weights data structure, which will be the class of datastruc- 
tures 1Z\j that the Sparse Multiplicative Weights IDC algorithm uses. : 
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Definition 3.4 (Sparse Multiplicative Weights Data Structure). The sparse multiplicative weights data struc- 
ture p SMW of size s is composed of three parts. We write D SMW = (p, h, ind). 

1. V is a collection of s real valued variables x\,...,x s , with Xj G [0, 1] for all i G [s]. Variable x^ for 
i G [s] is referenced by T>[i]. Initially Xj = l/s for all i G [s]. We define 2? [2] =0 for all i > s. 

2. h is a hash table fa : — > [s] U mapping elements in the universe X to indices i E [s]. Elements 
x <E X can also be unassigned in which case we write h(x) = 0. Initially, h(x) = for all x G ^ We 
write = x if h(x) = i, and fa -1 (i) = if there does not exist any x *E X such that /i(x) = i. 

3. ind E [s + 1] is a counter denoting the index of the first unassigned variable. For all i < ind, there 
exists some x G X such that h(x) = i. For all i > ind, there does not exist any x G X such that 
h(x) = i. Initially ind = 1. 

If ind < s, we can add an unassigned element x G X to D SMW . Adding an element x *E X to D SMW 
sets /i(x) <— ind and increments ind <— ind + 1. If ind = s + 1, attempting to add an element causes the 
data structure to report FAILURE. 

A linear query Q is evaluated on a sparse MW data structure D SMW = (V,h) as follows. 

Q(V SMW )= Q{x)-V[h(x)}+ Q(x)-V[md] 

x£X:Q(x)>0Ah(x)^d x£X:Q(x)>0Ah(x)=d 

We now present Algorithm [Q the Sparse Multiplicative Weights (SMW) IDC algorithm for m-sparse 
queries. The algorithm is a version of the Hardt/Rofhblum Multiplicative Weights IDC [HR10], modified to 
work without any dependence on the universe size. It will run multiplicative weights update steps over the 
variables of the SMW data structure, using the SMW data structure to delay assigning variables to particular 
universe elements x G X until necessary. Note that it is not simply running the multiplicative weights 
algorithm from [HR10] implicitly: doing so would yield guarantees that depend on the cardinality of the 
universe \X\. Instead, the guarantees we will get will depend only on m, and so will carry over even to the 
infinite-universe setting. 

Theorem 3.5. The Sparse Multiplicative Weights algorithm is a B(a)-IDC for the class ofm-sparse queries 
Q m , where: 

log s + 1 

B(a) = 4 k — 

or 

and s is the smallest integer such that s/(log(s) + 1) > Am/ a 2 . 

The analysis largely follows the Multiplicative Weights analysis given by Hardt and Rothblum MHR1011 . 
The main difference is that rather than using one global potential function, we must use a different potential 
function for each database update sequence, defined as a function of the state of the hash table in the last 
SMW datastructure in the sequence. We must also argue that we never run out of variables to assign in the 
SMW data structure, which would cause it to return FAILURE. To argue this, we apply the technique of 
Blum [Blu90], used to adapt Winnow to the infinite attribute model. 

Proof. We will consider any maximal (SMW, P MW , Q, a, T)-database update sequence 
j(X>SMW Q t; l t )J ^ We will argue that T < and that no data structure Vf MW in the 

sequence ever returns FAILURE when the SMW algorithm attempts to add some element x G X to it. 
Consider the real private database V and the final data structure in the sequence D T MW = (Vt, hx, ind-p). 
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Algorithm 1 The Sparse Multiplicative Weights (SMW) IDC Algorithm for m-sparse queries. It is instanti- 
ated with an accuracy parameter 77 = a/2. It takes as input a sparse MW datastructure D SMW , an m-sparse 
query Q 6 Q m , and an estimate of the query value A. 

SMW(Pf MW = (V t , h t ,indt), Qt, A): 

ifpSMW = 0then 

Let s be the smallest integer such that s/(log(s) + 1) > Am/a 2 . 

Return a new Sparse MW data structure D™w _ ^ indi) of size s with h\(x) = for all 
xEX,Xi = l/s for all i G [s], and indi = 1. 
end if 

Let P™w = (P t+1 , ht +1 , ind m ) <- P t SMW 

Update: For all x G ^ such that Q t {x) > 0: If /i m (x) = then add x to Pff*. 
ifl t <Q t (Pf MW )then 

Update: For all x G X such that Qt(x) > 0: Let 

V t+1 [h t+ i(x)} <- £> m [/i m (x)] • exp(-7/Qt(x)) 

else 

Update: For all x G such that > 0: Let 

Pt+i[/it+i(x)] <- P t+ i[/i t+ i(x)] • exp(??Q t (x)) 

end if 

Normalize: For all i G [s]: 

A+i[»] 



E-=i^+ib1 

Output P™ w . 
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We will define a non-negative potential function ^ based on hr and V and show that it decreases 
significantly at each step. We define: 

x:h T (x)^$ \ U ^ ;l / 

Claim 3.6. For a/Z t G [T], * t > -\ and ^ < log s 

Proof. The log-sum inequality states that for any collection of non-negative numbers ai,...,a n and 
61,... ,b n : 

(Sj- al0g (?) 

where a = J^™ =1 and 6 = J27=i We therefore have: 

V[x\ \ 



x:h T (x)+% \ 1 ^ n , 



> [ £ PNjiog 

K x:h T (x)^% 



1 

> 

e 

where the first inequality follows from the log-sum inequality, the second follows from the fact that 
Sx-fcTfx)^^^^)] — 1> an d the third follows from the fact that min a6 [ 01 ] a log a = —\. To see 
that \I>o < log s, recall that Vq[i\ = l/s for all i. Therefore: 

x:h T (x)^=$ 

Since P is a probability distribution, this expression takes maximum value log s. □ 

We will argue that in every step the potential drops by at least a 2 /4. Because the potential begins at 
log s, and must always be non-negative, we therefore know that there can be at most T < 4 log s/a 2 steps. 
To begin, let us see exactly how much the potential drops at each step: 

Lemma 3.7. 

~ *t+i > a 2 / 4 
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Proof. We follow the analysis of HHRIOH . We consider the case in which A t < Qt(X^ MW ). In this case: 

> £ *H log ( ""''"^r'fa? [M '" ) - >°« f texp(-,g l(f , r '0)))Abl' 

= Yl ~V[x]riQt(x) - log lj2^P(-r]Qt(K\j)))V t [j] 

x:Q t (x)>0 \j=l 



> 



-rjQ t (V) - log f J2exp(- V Q t (ht\j)))V t \j] 
- V Q t (V) - log f £(1 - vQt(K\j)) + r, 2 )V t [f\ 



= - V Qt (V) - log 1 + r, 2 - v ^ Qt{x)V t [h t (z)] 

\ x:Qi(a;)>0 

> a 2 /2-a 2 /4 
= a 2 /4 

In this calculation, we used the facts that: 

exp(-r]Q t (xi)) < 1 - r]Q t {xi) + r] 2 Q t (xi) 2 < 1 - r]Q t (xi) + r/ 2 

that £i=i = 1, that log(l + y) < y for y > — 1, that by the definition of a database update sequence, 
when A t < Q t (T>f MW ) we also have that Q t (V) < Q t (T>f MW ), and that by the definition of database 
update sequence we always have |(3t(£>f MW ) ~ Qt(P)\ > «■ Finally we recall that 7] = a/2 The case when 
A t > Q t (T>f MW ) is exactly similar. □ 

Theorem 13 .5 1 then immediately follows by combining Claim [3761 with Lemma 13771 

1 a 2 
— < *r < log s - T ■ — 

e 4 



Solving for T we find: 



T < 4 logg + 1/e 4 _log s + 1 



a 2 a 2 



Finally to see that the SMW data structure never reports FAILURE, it suffices to observe that indy < s. 
Because each query Q t is assumed to be m-sparse, at most m variables can be added to the SMW data 
structure at each update. Therefore, we have 

. , m 4m(log s + 1) 

ind T <m-T < ^ '- < s 

or 
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The last inequality follows from recalling that we chose s such that s/(log s + 1) > Am /a 2 . This completes 
the proof. □ 

Finally, we may observe that both the update time for the SMW IDC and the time to evaluate a query 
on the SMW datatructure is O(s) = 0{m/a 2 ). Therefore, we may instantiate Theorem 13 .3 1 with the SMW 
IDC algorithm to obtain the main result of this section: 

Theorem 3.8. For any < e, 5, (3 < 1 There exists an (e, 5)-differentially private query release mechanism 
in the interactive setting, with running time per query 0(m/a 2 ) that is (a, ^-accurate with respect to the 
set of all m-sparse linear queries Q m , with: 



a = 



( (logm) 1 / 4 (logf-logf)^ 



V 



n) 1 ' 2 



Proof. The proof follows by instantiating Theorem 13.31 with the SMW IDC algorithm, together with the 
bound B{a) = 4 ( lo g| +1 ) proven in Theorem 13.51 and recalling that s is the smallest integer such that 

s/(logs + 1) > Am/a 2 . □ 

3.1 Applications to Conjunctions 

In this section, we briefly mention a simple application of this algorithm to the problem of releasing conjunc- 
tions with many literals. The algorithm given in this section leads to new results for releasing conjunctions 
on d — k out of d literals. This complements the recent results of Hardt, Rothblum, and Servedio [HRS1 1 ] 
for releasing conjunctions on k out of d literals. The class of conjunctions are defined over the universe 
X = {0, l} d equal to the <i-dimensional boolean hypercube. 

Definition 3.9. A conjunction is a linear query specified by a subset of variables S C [d], and defined by 
the predicate Qs : {0, l} d — > {0, 1} where Qs(x) = ILeS x i- ^ e sa y tnat a conjunction Q$ has t literals 
if|5|=t. 

Remark 3.10. The set of all conjunctions of d — k literals, denoted C^-k is 2 k sparse, and of size \C\ < d k . 

We can release the answers to all queries in C^-k by running the sparse multiplicative weights algorithm 
on each query. We therefore get the following corollary: 

Corollary 3.11. There exists an (e, S)-differentially private algorithm in the non-interactive release setting 
with running time at most 

2 k \ ^f(2d) k ' 



O I \C d _ k \ -—1=0 



a 2 



a 2 J \ 

that is (a, (3) -accurate for the set of all conjunctions on d — k literals, which requires a database of size 
only: 

a.5i_i log d 



n > 



ea 2 



We note that the running time of this algorithm is comparable to the running time of the algorithm of 
[HRS11] for releasing all conjunctions of k out of d literals to worst case error (time roughly 0(|Cfc|) = 
0{d k )), but requires a database of size only roughly /c 15 log d, rather than 

d d(Vk) as required by [HRSTQ. 

Of course, conjunctions on k literals are a more natural class than conjunctions on d — k literals, but the 
results are complimentary. 
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Moreover, applying the sparse multiplicative weights algorithm in the interactive setting gives polyno- 
mially bounded running time per query for conjunctions on d — k literals for any k = O(logn). Note 
that this is still a super-polynomially sized class of conjunctions, with \C Q ogn ^\ = d°^ ogn \ This is 
the first interactive query release algorithm that we are aware of that is simultaneously privacy-efficient 
and computationally-efficient for a super-polynomially sized class of conjunctions (or any other family of 
queries with super-constant VC -dimension). 

4 A Non-Interactive Mechanism via Random Projection 

In this section, we give a non-interactive query release mechanism for sparse queries based on releasing a 
perturbed random projection of the private database, together with the projection matrix. Note that when 
viewing the database V as a vector, it is an | X | -dimensional object: V G M^L A linear projection of V 
into T dimensions is obtained by multiplying it by a \X\ x T matrix, which cannot even be represented 
explicitly if we require algorithms that run in time polynomial in n = \V\ for n << \X\. It is therefore 
essential that we use projection matrices which can be represented concisely using hash functions drawn 
from limited-independence families. 

We will use a limited-independence version of the Johnson-Lindenstrauss lemma presented in MKN10I1 . 
first proven by HAch01llCW09t 

Theorem 4.1 (The Johnson-Lindenstrauss Lemma with Limited Independence [AchOl, CW09, KN10]). 
For d > an integer and any < <j,r < 1/2, let Abe aT x d random matrix with ±1/ \JT entries that are 
r-wise independent for T > 4 • 64\~ 2 log(l jr) and r > 2 log(l/ r). Then for any x G M. d : 

Pr[|||^x|||-||x|||| >f||x||l] <r 

We will use the fact that random projections also preserve pairwise inner products. The following 
corollary is well known: 

Corollary 4.2. For d > an integer and any < r < 1/2, let A be a T x d random matrix with 
±l/y/T entries that are r-wise independent for T > 4 • 64 2 <r~ 2 log(l/r) and r > 21og(l/r). Thenfor any 

Pr[\((Ax), (Ay)) - (x,y)\ > '-(\\x\\ 2 2 + \\y\\ 2 2 )} < 2r 

Proof. Consider the two vectors u = x + y and v = x — y. We apply Theorem 14. II to u and v. By a union 
bound, except with probability 2r we have: || \A(x + y)\% — 1 1 ^ + y 1 1 1 1 — ?ll x + 2/ll2 an( ^ II \A(x — y)||l ~ 
\\x — yll 2 ! < s\\x — y\\l- Therefore: 

((Ax), (Ay)) = ~((A(x + y),A(x + y))-(A(x-y) f A(x-y))) 
= \(\\A(x + y)\\ 2 2 + \\A(x-y)\\l) 
< ^((l + ,)\\x + y\\ 2 2 -(l-,)\\ x -y\\ 2 2 ) 

= <a:,y> + 5(INIl + lly|ll) 

An identical calculation shows that ((Ax), (Ay)) > (x,y) — | (ll^ll 2 + H?/!!!)' which completes the proof. 

□ 
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Definition 4.3 (Random Projection Data Structure). The random projection datastructure V r of size T is 
composed of two parts: we write V r = (it, /). 

Lug M T is a vector of length T. 

2. f : [\X\ ■ T] — > { — 1/y/T, 1/vT} is a hash function implicitly representing a T x \X\ projection 
matrix A G {-1/VT, l/y/T} T *W. For any G T X |Af|, we write A[i,j] for /(|^| • (i- 1) + j). 

To evaluate a linear query Q on a random projection datastructure V r = (u, f) we first project the query 
and then evaluate the projected query. To project the query we compute a vector Q G M T has follows. For 
each i G [T] 

Q[i]= Yl Q[x]-A[i,x] 

xEX:Q(x)>0 

Then we output: Q(V r ) = ^(Q,u). 



Algorithm 2 SparseProject takes as input a private database V of size n, privacy parameters e and 5, a 
confidence parameter [3, a sparsity parameter m, and the size of the target query class k. 
SparseProject(P, e, 5, (3, m, k) 

Let r <- A, T <- 4 • 64 2 • log (i) f ^ + + ^n 2 V a <- , 6 , , 

4fc & V r 7 V. 2 2^ 1 V p ^/8 ln(l/(5) 

Let / be a randomly chosen hash function from a family of 2 log(A;T/2/3)-wise independent hash func- 
tions mapping [T x \X\] -»■ {-1/y/T, 1/y/T}. Write j] to denote • (i - 1) 
Let u, i/ G M T be a vectors of length T. 
for * = 1 to T do 

LetUj E^^^^o^W -A[i,x] 

Let Lap(l/<r) 
end for 

Output V r = (u + i/, /). 



Remark 4.4. There are various ways to select a hash function from a family of r-wise independent hash 
functions mapping [T x \X\] — > {0, 1}. The simplest, and one that suffices for our purposes, is to select the 
smallest integer s such that 2 s > Tx\X\, and then to let f be a random degree r polynomial in the finite field 
GF[2 S ]. Selecting and representing such a function takes time and space 0(r ■ s) = 0(r(log \X\ + log T)). 
f is then an unbiased r-wise independent hash function mapping G¥[2 S ] — > G¥[2 S ]. Taking only the last 
output bit gives an unbiased r-wise independent hash function mapping [T x \X\] to {0, 1}, as desired. 

Theorem 4.5. SparseProject is (e, 5) -differentially private. 

Proof. For each i, write u, (2?) = Ylx-v[x]>o x]. Note that because each entry of A has magnitude 

1/Vt, for any database V that is neighboring with V, \ui(D) — Ui(V')\ < 1/VT. Therefore by Theorem 
12.91 releasing Uj + i/j preserves (e/( y8T ln(l Jd)), 0) -differential privacy. We may now apply the composi- 
tion Theorem |2.10| to find that releasing all T coordinates of u + v preserves (e, 5) -differential privacy. Note 
that / was chosen independently of V, and releasing it has no privacy cost. □ 

We first give a high probability bound on the maximum magnitude of any coefficient Qi of a projected 
query for any query Q G Q. If we were using a random sign matrix for our projection, the following lemma 
would be a consequence of a simple Chernoff bound, but because we are using only a limited independence 
family of random variables, we must be more careful. 
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Lemma 4.6. Let Qbe a collection of m-sparse linear queries of size \Q\ = k, and A G M Tx \ x \ be a matrix 
with r-wise independent entries taking values in { — 1/ y/T, 1 / VT}, for some even integer r. Denote the 
projection of Q G Q by A by Q G M T . Then except with probability at most (3 



max max I Q \i] I < 

QeQie[T] 1 



k-T 



1/r 



Proo/ We follow the approach of Bellare and Rompel ||BR94||DP09l . Recall that for any query Q,Q eR T 
is defined to be the vector such that Q[i\ = J2x<=x q(x)>o Qi x ] ' A[i,x]. Note that each coordinate is 
dominated by the sum of at most m r-wise independent Rademacher random variables (i.e. Bernoulli 
random variables taking values in {—1, 1}): Q[i] < YaLi Ri> anc ^ so ^ i s sufficient to bound this sum. 
Equivalently, we can write Q[i] < (2 YliLi &i ~ m )> where the BiS are r-wise independent Bernoulli 
random variables. Let B = YliLi Markov's inequality, we have: 



Pr 



\B \>t 

2 1 



Pr 



v 2 ; 



< 



e [{b - f y 



V 



2 ) 



(1) 

where B is the 



Note that because the B^s are r-wise independent, we have E \(B — -y) r ] = E 
sum of m truly independent Bernoulli random variables. We can therefore apply a standard Chernoff bound 
to control B: 



E 



(B-m/2) r 



< 



< 




where the first inequality follows by a Chernoff bound and the second inequality follows by Stirlings ap- 
proximatiorQ. Plugging this in to Equation [T] we find: 



Pr 



\B-j\>t 



,'mr\ r / 2 



(2) 



Recall that \Q[i]\ > c if and only if \B 



in 

2 



> ^ ' c - Applying Equation [2] and taking a union bound over 
all k queries and T indices per query proves the lemma. □ 

Corollary 4.7. Let Q be a collection of m-sparse linear queries of size \Q\ = k, and A G M Tx l^l be a 
matrix with r-wise independent entries taking values in {— l/y/T, l/\/T}, for some integer r > log 

Denote the projection of Q G Q by A by Q G M T . Then except with probability at most (3 



max max I Q \i\ 



< 4- 



y/m log(fcT/2/3) 



The form of Stirlings approximation that we use is: 

k\ <e 1/(12fc) v^fe 
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We will also make use of a tail bound for sums of Laplace random variables. This bound is likely well 
known. We use a version proven in [GRU1 1 ]. 

Lemma 4.8 ([GRU11]). Suppose that {Y{\J =1 are i.i.d. Lap(6) random variables, and scalars E 
[-B, B\. Define Y = ZLi <li Y i- Then: 



Pr[|y| >b«]< exp ' Ifa - Th: 

' \ exp (-£) , Ifa > Tb. 

We can now prove a utility theorem for SparseProject: 

Theorem 4.9. For any < e, 5 < 1, and any (3 < 1, and with respect to any class of m- sparse linear 
queries Q C Q m of cardinality \ Q\ < k, SparseProject is (a, (3) -accurate for: 



■ l\ \lm\o, 

a = O I to] ' 




P J en 



where the O hides a term logarithmic in (m + n). 

Proof. Let V r = (n, /) be the random-projection data-structure output by SparseQueries, where u = u + v. 
Consider any fixed query Q E Q. Let Q E M. T denote the projection of Q by the mattix implicitly defined 
by /. We have: 

Q(p r ) = -{Q,u) = - ((Q, u) + (Q, v) 
n n \ 

We will have two sources of error: distortion from the random projection, which we will analyze using the 
Johnson-Lindenstrauss lemma, and error introduced because of the Laplace noise added for privacy. We 
will analyze each source separately, starting with the error from the random projection. 

Recall that we selected r = ^ and T = 4 ■ 64\ -2 log(l/r) for q = ■ Therefore, applying 

Corollary !4.2l together with a union bound over all k queries Q E Q, except with probability at most /J/2: 

max\{Q,V) - {Q,u)\ < + ||Q||!) 

< — (n 2 + m) 



in 



We now consider the error introduced by the Laplace noise v. We first apply Corollary 14.71 to see that 
except with probability at most /3/4, we have: 



~ ^/m\og(2kT/P) 
max max \ Q[i\\ < 4 



QGSie[T]' y/T 



Conditioning on this event occurring, we may apply Lemma l4~8l with B = 4 • V m j°gpfc ?7/j) together with 
a union bound over all k queries Q E Q, to find that except with probability at most /3/4: 




W3 / , /4fc\ , fl\ / /2k , , 

' m log — log - log — + log T 



e V \P J \SJ V \P 



O I log I | 



TO log (}) 



P e 
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where the O is hiding a log(T) term which is logarithmic in m and n. 

Finally we can complete the proof. We have shown that except with probability at most /3: 

max\Q(V) - Q(V r )\ = i max \(Q,V) - (Q,u}\ (3) 

< ^max(\(Q,V)-{Q,u)\ + \(Q,u)\j (4) 



log (1) 



'm 



which completes the proof. □ 
4.1 Applications to Conjunctions 

In this section, we again briefly briefly mention a simple application of our non-interactive mechanism to 
the problem of releasing conjunctions with many literals. This gives the first polynomial time algorithm for 
non-interactively releasing a super-polynomially sized set of conjunctions. 

Definition 4.10. Recall that a conjunction is a linear query specified by a subset of variables S C [d], and 
defined by the predicate Q$ : {0, l} d — > {0, 1} where Qs(x) = flies Xi ~ ^ e sa y tnat a conjunction Qs 
has t literals if | S\ = t. 

Remark 4.11. The set of all conjunctions of d — k literals, denoted Cd-k is 2 fe sparse, and of size \Cd-k\ < 
d k . 

Sparseproject therefore gives the following corollary: 

Corollary 4.12. There exists an (e, 5) -differentially private algorithm in the non-interactive release setting 
with polynomially bounded running time, that is (a, f3) -accurate for the class of conjunctions C^_i og „ on 
d — log n literals for: 

a = O \ log n log d + log — 7 =r- 

Note that Cd-\ ogn is a super-polynomially sized set of conjunctions. As far as we know, this rep- 
resents the first algorithm in the non-interactive setting with non-trivial accuracy guarantees for a super- 
polynomially sized set of conjunctions that also achieves polynomial running time. 



5 Conclusions and Open Problems 

In this paper, we have given fast interactive and non-interactive algorithms for privately releasing the class 
of sparse queries. Query release algorithms with run-time polynomial in the database size are unfortunately 
rare, and so a natural question is whether the fast algorithms given here can be leveraged as subroutines in 
the development of efficient algorithms for other applications. Of course the main question which remains 
open is to find other classes of queries for which fast data release algorithms exist. Random projections of 
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the database, together with concise representations of the projection matrix seem like a powerful tool. Can 
they be leveraged in a setting beyond the case of sparse queries, when the norm of the queries are comparable 
to the norm of the database itself? 
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